home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Grab Bag
/
Shareware Grab Bag.iso
/
007
/
sccs.arc
/
HDIFF.DOC
< prev
next >
Wrap
Text File
|
1985-12-10
|
15KB
|
352 lines
19 February, 1984
hdiff 1.14
Purpose
-------
hdiff is a utility which can compare two standard DOS text files
and isolate the differences between them. It can produce two
distinct types of reports on the differences. First, hdiff can
prepare a simple report of lines which appear in the second file
but not in the first (insertions), and of lines which appear in
the first file but not in the second (deletions). Second, hdiff
can produce a special "report" which is, in fact, an Edlin
script. This script, when applied to the first file, will
produce a clone of the second file. This second function of
hdiff is similar to the Unix utility "diff".
hdiff uses a file comparison algorithm which was developed by
Paul Heckel and described by D.E.Cortesi in Dr. Dobb's Journal
#94 (August, 1984). The algorithm is substantially more
efficient than traditional file comparison methods; you will find
that it can generate a difference report between two files in
little more than the time it takes to read the two files.
This version of hdiff was derived from Cortesi's demonstration
program, with substantial modifications which
-- accomodate differences between Edlin and CP/M's Ed (for which
the demo was written)
-- allow use of Edlin's block move capabilities
-- allow for much larger files through the use of all available
memory.
-- allow the use of command line parameters and switches,
including case and spacing insensitivity.
-- allow the user to specify at run time the maximum number of
lines which will be processed. This allows hdiff to use memory
more efficiently.
-- allow the user to request the simpler difference report rather
than the Edlin script.
System requirements
-------------------
hdiff requires:
-- IBM PC, PC/XT, PC/AT, or other MSDOS machine
-- MSDOS 2.00 or later
-- At least 128K of RAM. The more RAM you have, the larger the
files you can process.
Running hdiff
-------------
The general syntax for hdiff is:
hdiff [-ecs] [-nnnn] oldfile.ext newfile.ext
The optional -e switch instructs hdiff to produce an Edlin script
file rather than the difference report.
The optional -c switch instructs hdiff to ignore differences in
case: "HDIFF" is the same as "hdiff".
The optional -s switch instructs hdiff to ignore differences in
spacing; all spaces and tabs are ignored for comparison purposes.
The optional -nnnn switch assists in memory management; it
represents the maximum number of lines hdiff will be required to
process, i.e., the number of lines in the larger of the two
files. The default for this value is 2000 lines; there is an
absolute maximum of 5000 lines. See the section on memory
management for more information about this switch.
The switches may be combined, and they may be in any order: '-e
-c -1000', '-1000ce', and '-ce1000' are all equivalent.
Examples:
hdiff foo.c newfoo.c
compares file 'foo.c' with file 'newfoo.c' and produces a simple
report showing insertions (lines in newfoo which do not appear in
foo) and deletions (lines in foo which do not appear in newfoo).
Lines which have been moved but are otherwise unchanged do not
appear in this report.
hdiff -ec foo.c newfoo.c
compares foo.c with newfoo.c, ignoring case differences, and
prepares an Edlin script. This script, if applied to foo, will
create a copy of newfoo. The script file is sent to the console,
so a more useful command is
hdiff -e foo.c newfoo.c > foo.dat
which uses standard DOS redirection to send the edlin script to
the disk file foo.dat. Note that the program logo and error
messages are unaffected by redirection and will always be sent to
the screen.
hdiff -e4000 foo.c newfoo.c > foo.dat
is equivalent to the previous command, except that it informs
hdiff that one of the files might contain up to 4000 lines.
Report formats
--------------
The difference report consists of lines in the format:
nnnn+ text
or
nnnn- text
The '+' format indicates that the line is new (an insertion); the
'-' format indicates that the line is gone (a deletion). Thus:
0001- This line appears in the old file only
0001+ This line appears in the new file only
The 'nnnn' represents the line number. For '+' lines, it's the
line number in the new file; for '-' lines, it's the line number
in the old file.
The Edlin script is a series of Edlin commands. See Edlin
documentation for their meanings; the only commands which will
appear are I (insert), D (delete), M (move), and E (End). The
script may look a little strange if you look at it (with an
editor or via the TYPE command). After the completion of each
insertion sequence, there will be a heart symbol; this is the
screen representation of Ctrl-C, which is used to terminate an
Edlin insertion.
Uses
----
The simplest use for hdiff is to compare two files to see if they
are the same. This can be used to check for corruption during
backups, copies, etc., or to determine which of two files is
newer. Even this simple use of hdiff can be useful in unexpected
ways, however. For example, look at this small batch file:
dir a: > temp
find "-" temp > dir.a
dir b: > temp
find "-" temp > dir.b
hdiff dir.b dir.a > temp.bat
erase dir.a
erase dir.b
erase temp
This batch can be used for a simple backup system. Assume that
the default directory in drive A contains a series of files that
you want to backup, and that the default directory in drive B
contains the same set of files from the last backup. The batch
will isolate differences between the two directories and prepare
a file called temp.bat which contains a list of those files which
have been changed or added since the last backup. (The .bat
extension is used because many popular text editors could very
easily convert the temp.bat file to a series of copy commands
which could be used, in batch mode, to perform the copying.)
The "Edlin" mode has potentially much more significant use.
Perhaps its greatest potential lies in what are known as "source
code control systems". These systems, quite common in mainframe
and minicomputer systems, allow programmers to maintain many
generations of program source text quite economically; rather
than storing each modified file in its entirety, only the
original is saved, along with a series of difference files.
Hdiff provides a first step in this direction for MSDOS machines
(see the "Plans" section below). Typical use of the current
hdiff would be something like this. Assume that hdiff.scc
contains an "original" version of hdiff; the current version
(1.10) is hdiff.c. First, the command
hdiff -e hdiff.scc hdiff.c > hdiff.110
will create an edlin script which would convert hdiff.scc into
version 1.10 of hdiff.c. Typically, the actual hdiff.c file
would them be discarded (WARNING: see below. This program is
experimental!) As newer versions are developed, the same
procedure is used to create hdiff.111, hdiff.120, etc. Note that
these difference files would, in all likelihood, be much smaller
that the total size of all of the versions.
In order to "retrieve" an earlier version, say 1.00, the command
copy hdiff.scc hdiff.c
edlin hdiff.c < hdiff.100
would convert hdiff.scc into version 1.00 of hdiff.
True source code control systems are considerably more efficient
than this "by hand" method, are much easier to use, and provide
significant features beyond mere storage of multiple versions.
For whatever it's worth, note that
hdiff -e file1 file2 | edlin file1
is roughly equivalent to
copy file2 file1
except that the original file1 is saved in file1.bak.
cdelta and cget
---------------
The two demonstration batches, cdelta and cget, provide a quick
sample of the kinds of things that can be done with hdiff and
edlin. The two batches are designed for C programs; to revise
them for other languages, simply replace all references to ".c"
with the desired extension (.asm, for example).
The purpose of cdelta is to generate a change script which will
convert a "base" source file into a specified version of your
source. Cget performs the inverse task; it applies a specified
change file to the base and produces a file containing the
specified version. File naming conventions are as follows:
file.scc: "base" source; scc = source code control
file.###: A change script to produce version ###
file.c: The current version (cdelta), or the
output file (cget)
For example, suppose you are working with a C program called foo.
A base (earliest) version of this file should be in foo.scc. You
have just finished revision 1.10 of foo. To create the change
file, type
cdelta foo 110
The batch will create a new file, foo.110; this file is an Edlin
script which will convert foo.scc into version 1.10 of foo.c.
To retrieve a specified version, say 1.05, use
cget foo 105
The batch will apply the script foo.105 to foo.scc and produce
foo.c, which will contain the source for version 1.05.
Note that cget always creates a file called file.c, overwriting
any existing file by that name. This implies that you do NOT
keep your current source in file.c; you keep the current source
only by retaining file.scc and the delta files.
Memory management
-----------------
Hdiff uses all available memory. The purpose of the -nnnn (max
number of lines) switch is to allow it to use memory more
efficiently, and to allow you to more effectively use hdiff in
very small or very large machines. This is how it works.
For each *potential* line, hdiff requires approximately 34 bytes
of storage for various tables. The default configuration (space
for 2000 lines) will thus require about 68K bytes of data space
for the tables. The remainder of available memory (less the size
of the program itself and a much smaller amount of overhead data)
is used to store the text read from the files. Text storage
space is required for each *unique* line in either file.
If you have a small machine (i.e., less RAM), that much table
space will leave very little room for text storage; it may even
be more space than is available, and the program will not run at
all. If you find this to be the case, try reducing the number of
lines via the switch (-1000, or -500, for example.)
Conversely, if you have a very large machine, you will have
plenty of space available to process file larger than 2000 lines.
If that is the case, increase the maxlines switch as necessary
(but remember that in no case can maxlines exceed 5000).
When hdiff is finished, it displays a message like:
Storage use: 19%
This message tells you approximately what percentage of the total
available memory was actually used.
Restrictions
------------
The following act, in one way or another, as restrictions on
hdiff:
-- File format. Hdiff is intended as a DOS text file comparator
only. It is NOT a replacement for the DOS utility 'comp'. Don't
use it on binary (program or data) files, or on word processor
files if they contain embedded control codes.
-- Available memory (as discussed above)
-- Actual size of the files. Edlin will read a file only until
75% of its available memory is filled. Since Edlin uses only a
maximum of 64K, this means that it will read only 48K of text.
Hdiff cannot account for this problem, so the absolute maximum
file size it can handle is approximately 48K.
-- Line size. Limited to a maximum of 255 characters/line.
A Warning and A Plan
--------------------
Hdiff is experimental! It has been in use for about six months
(as of 19 Feb 1985) with no known errors, but this is is NOT to
say that you should entrust your only copy of a source file to
hdiff! Please bear this in mind as you use it. Please report
any problems to me.
I intend, at some "unspecified future time", to incorporate hdiff
or a version of it in a larger source code control system. This
system would allow you to maintain multiple generations of
program source files very efficiently (in terms of storage
requirements). Some knotty problems relating to performance on a
standard-issue PC remain to be solved. Comments or suggestions
relating to this system are welcome. Tell me what you would like
to see. In the meantime, a temporary "system" is avilable in the
file "sccs.lbr", which contains simple versions of get and delta
(written in C for performance reasons).
---------------------
hdiff and this document are
Copyright (c) 1984, 1985 by:
Christopher J. Dunford
10057-2 Windstream Drive
Columbia, Maryland 21044
CompuServe 76703,2002
Source STR211
You may copy and use hdiff for your personal use only. You may
copy hdiff for others, but you may not charge them for it. You
may not use hdiff for any commercial purpose whatsoever. Address
comments to the author at the above address, at CompuServe
(preferably) or at the Source (occasionally).
Hdiff is written in C and compiled using the Computer Innovations
C86 compiler (Version 2.13), large model.